Cluster analysis: unsupervised learning via supervised learning with a non-convex penalty

نویسندگان

Wei Pan

Xiaotong Shen

Binghui Liu

چکیده

Clustering analysis is widely used in many fields. Traditionally clustering is regarded as unsupervised learning for its lack of a class label or a quantitative response variable, which in contrast is present in supervised learning such as classification and regression. Here we formulate clustering as penalized regression with grouping pursuit. In addition to the novel use of a non-convex group penalty and its associated unique operating characteristics in the proposed clustering method, a main advantage of this formulation is its allowing borrowing some well established results in classification and regression, such as model selection criteria to select the number of clusters, a difficult problem in clustering analysis. In particular, we propose using the generalized cross-validation (GCV) based on generalized degrees of freedom (GDF) to select the number of clusters. We use a few simple numerical examples to compare our proposed method with some existing approaches, demonstrating our method's promising performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PeakSegJoint: fast supervised peak detection via joint segmentation of multiple count data samples

Joint peak detection is a central problem when comparing samples in genomic data analysis, but current algorithms for this task are unsupervised and limited to at most 2 sample types. We propose PeakSegJoint, a new constrained maximum likelihood segmentation model for any number of sample types. To select the number of peaks in the segmentation, we propose a supervised penalty learning model. T...

متن کامل

Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach

In many graph-based machine learning and data mining approaches, the quality of the graph is critical. However, in real-world applications, especially in semisupervised learning and unsupervised learning, the evaluation of the quality of a graph is often expensive and sometimes even impossible, due the cost or the unavailability of ground truth. In this paper, we proposed a robust approach with...

متن کامل

Unsupervised Learning of Predictors from Unpaired Input-Output Samples

Unsupervised learning is the most challenging problem in machine learning and especially in deep learning. Among many scenarios, we study an unsupervised learning problem of high economic value — learning to predict without costly pairing of input data and corresponding labels. Part of the difficulty in this problem is a lack of solid evaluation measures. In this paper, we take a practical appr...

متن کامل

Multitask kernel-based learning with first-order logic constraints

In this paper we propose a general framework to integrate supervised and unsupervised examples with background knowledge expressed by a collection of first-order logic clauses into kernel machines. In particular, we consider a multi-task learning scheme where multiple predicates defined on a set of objects are to be jointly learned from examples, enforcing a set of FOL constraints on the admiss...

متن کامل

Meta-Unsupervised-Learning: A supervised approach to unsupervised learning

We introduce a new paradigm to investigate unsupervised learning, reducing unsupervised learning to supervised learning. Specifically, we mitigate the subjectivity in unsupervised decision-making by leveraging knowledge acquired from prior, possibly heterogeneous, supervised learning tasks. We demonstrate the versatility of our framework via comprehensive expositions and detailed experiments on...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Journal of machine learning research : JMLR

دوره 14 7 شماره

صفحات -

تاریخ انتشار 2013

Cluster analysis: unsupervised learning via supervised learning with a non-convex penalty

نویسندگان

چکیده

منابع مشابه

PeakSegJoint: fast supervised peak detection via joint segmentation of multiple count data samples

Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach

Unsupervised Learning of Predictors from Unpaired Input-Output Samples

Multitask kernel-based learning with first-order logic constraints

Meta-Unsupervised-Learning: A supervised approach to unsupervised learning

عنوان ژورنال:

اشتراک گذاری